J’importe les librairies
| libraries |
|---|
| knitr |
| FactoMineR |
| factoextra |
| gprofiler2 |
| pheatmap |
| dplyr |
Samples from two mouse models were collected. The first one is a reversible chemical-induced injury model (folic acid (FA) induced nephropathy). The second one is an irreversible surgically-induced fibrosis model (unilateral ureteral obstruction (UUO)). mRNA and small RNA sequencing, as well as 10-plex tandem mass tag (TMT) proteomics were performed with kidney samples from different time points over the course of fibrosis development.
Rappel sur les échantillons:
Deux modèles de fibrose rénale chez la souris sont étudiés:
Le premier est un modèle de néphropathie réversible induite par l’acide folique (folic acid (FA)). Les souris ont été sacrifiées avant le traitement (normal), puis à jour 1, 2, 7 et 14 (day1,…) après une seule injection d’acide folique.
Le second est un modèle irréversible induit chrirurgicalement (unilateral ureteral obstruction (UUO)). les souris ont été sacrifiées avant obstruction (day 0) et à 3, 7 et 14 jours après obstruction par ligation de l’uretère du rein gauche.
A partir de ces extraits de rein, l’ARN messager total et les petits ARNs ont été séquencés et les protéines caratérisées par spectrométrie de masse en tandem (TMT).
Supplementary material of the article with all the data tables (zip archive):
Les données se trouvent aussi dans le dépôt github
Les comptages du modèle FA sont dans tables/fa/results/counts/ pour la transcriptomique tables/pfa/results/counts/ pour la protéomique
Les comptages du modèle UUO sont dans tables/uuo/results/counts/ pour la transcriptomique tables/puuo/results/counts/ pour la protéomique
Nous travaillerons sur le modèle FA.
Analyse d’expression différentielle pour les données de protéomique et transcriptomique => identifier les gènes/protéines significativement différentiellement exprimés dans le modèle FA en comparant Day 7 à Day 0.
Analyse multi-omique (transcripto + protéo) avec, au choix, MOFA, mixOmics, mixKernel ou d’autres outils de factorisation multi-matrices. Vous pouvez soit vous focaliser sur un time point, soit intégrer les différents time points, en partant des données normalisées fournies dans le matériel supplémentaire du papier.
Construction de réseau avec WGCNA à partir des données transcriptomiques.
Reconstruct the co-expression network from all the time points of the FA transcriptomics data. Propose to filter and remove all the zero expressed genes, the NAs and the less informative genes from the transcriptomics data. (I remove all the genes that are not expressed in at least 9 out of the 18 conditions (expression > 1 TPM in 9) and then filter with the coefficient of variation > 0.75).
Then apply the first part of the network reconstruction steps as we saw them on the WGCNA course until the module predictions.
Instead of using WGCNA’s module prediction routines, apply a universal threshold of 0.5 on the adjacency matrix, and obtain an adjacency matrix that is reduced in size. This is the network. Import it to Cytoscape with aMatReader plugin.
Visualize, analyze the network and superimpose the proteomics data on it.
Colorez dans le réseau choisi les noeuds en fonction des données de protéomiques avec un gradient de couleur correspondant au fold-change des données de protéomique.
Vous fournirez un rapport au format pdf généré à partir d’un Rmd (déposez-nous impérativement les 2 fichiers, Rmd et pdf, avec comme nom de fichier “NOM-PRENOM_evaluation-m6-2021” + .Rmd ou .pdf dans le dossier /shared/projects/dubii2021/
Vos travaux doivent être reproductibles, pensez à décrire et justifier les différentes étapes, seuils, extractions … Bon courage ! Nous sommes disponibles sur Slack en cas de besoin.
Je télécharge les quatre fichiers dans un dossier local ~/Module6Projet, et les charge dans les data.frames suivants:
fa_exprfa_metapfa_exprpfa_metaJ’importe directement les données normalisées
#' @title Download a file only if it is not yet here
#' @author Jacques van Helden email{Jacques.van-Helden@@france-bioinformatique.fr}
#' @param url_base base of the URL, that will be prepended to the file name
#' @param file_name name of the file (should not contain any path)
#' @param local_folder path of a local folder where the file should be stored
#' @return the function returns the path of the local file, built from local_folder and file_name
#' @export
downloadOnlyOnce <- function(url_base,
file_name,
local_folder) {
## Define the source URL
url <- file.path(url_base, file_name)
message("Source URL\n\t", url)
## Define the local file
local_file <- file.path(local_folder, file_name)
## Create the local data folder if it does not exist
dir.create(local_folder, showWarnings = FALSE, recursive = TRUE)
## Download the file ONLY if it is not already there
if (!file.exists(local_file)) {
message("Downloading file from source URL to local file\n\t",
local_file)
download.file(url = url, destfile = local_file)
} else {
message("Local file already exists, no need to download\n\t",
local_file)
}
return(local_file)
}## Specify the basic parameters
pavkovic_base <- "https://github.com/DU-Bii/module-3-Stat-R/tree/master/stat-R_2021/data/pavkovic_2019"
pavkovic_folder <- "~/DUBii-m3_data/pavkovic_2019"
#### Dowload folic acid data and metadata ####
## Transcriptome data table
local_fa_file <- downloadOnlyOnce(
url_base = pavkovic_base,
file_name = "fa_raw_counts.tsv.gz",
local_folder = pavkovic_folder
)
## Transcriptome data table normalized
local_fa_file_norm <- downloadOnlyOnce(
url_base = pavkovic_base,
file_name = "fa_normalized_counts.tsv.gz",
local_folder = pavkovic_folder
)
## Transcriptome metadata
trans_metadata_file <- downloadOnlyOnce(
url_base = pavkovic_base,
file_name = "fa_transcriptome_metadata.tsv",
local_folder = pavkovic_folder
)
## Proteome data table
local_pfa_file <- downloadOnlyOnce(
url_base = pavkovic_base,
file_name = "pfa_model_counts.tsv.gz",
local_folder = pavkovic_folder
)
## Proteome data table normalized
local_pfa_file_norm <- downloadOnlyOnce(
url_base = pavkovic_base,
file_name = "pfa_model_log2_counts.tsv.gz",
local_folder = pavkovic_folder
)
## Proteome metadata
prot_metadata_file <- downloadOnlyOnce(
url_base = pavkovic_base,
file_name = "pfa_proteome_metadata.tsv",
local_folder = pavkovic_folder
)#' @title Load a tab-separated value file and manually set row ames after having forced them to unique values
#' @author Jacques van Helden email{Jacques.van-Helden@@france-bioinformatique.fr}
#' @param file file path
#' @param header=1 Header is set to 1 by default
#' @param sep="\t" Column separator is set to tab by default
#' @param rownames.col=1 Column containing the row names
#' @param ... all other parameters are passed to read.delim()
#' @return a data frame with the loaded data
load_fix_row_names <- function(file,
header = 1,
sep = "\t",
rownames.col = 1,
...) {
x <- read.delim(file = file, ...)
rownames(x) <- make.names(x[, rownames.col], unique = TRUE)
x <- x[, -rownames.col]
return(x)
}## Load transcriptome data
fa <- read.delim(file = local_fa_file, sep = "\t", header = TRUE)
## Load same data with load_fix_row_names
fa_expr <- load_fix_row_names(file = local_fa_file, rownames.col = 1)
kable(head(fa_expr), caption = "Loaded with myEasyLad() fa")| day1_1 | day1_2 | day1_3 | day14_1 | day14_2 | day14_3 | day2_1 | day2_2 | day2_3 | day3_1 | day3_2 | day3_3 | day7_1 | day7_2 | day7_3 | normal_1 | normal_2 | normal_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSMUSG00000000001 | 2278.80022 | 1786.498848 | 2368.618959 | 627.758017 | 559.156031 | 611.4338605 | 2145.223456 | 262.454849 | 745.843632 | 987.1850529 | 1077.645231 | 1335.116771 | 1096.075988 | 1035.8458456 | 1090.0375905 | 483.2298191 | 1842.14841 | 475.696800 |
| ENSMUSG00000000003 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0000000 | 0.000000 | 0.000000 | 0.000000 | 0.0000000 | 0.000000 | 0.000000 | 0.000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.00000 | 0.000000 |
| ENSMUSG00000000028 | 36.27547 | 22.147861 | 39.484949 | 14.470759 | 10.167813 | 31.6910193 | 300.558779 | 4.771672 | 123.896184 | 51.8555945 | 8.434511 | 69.936866 | 6.665195 | 6.9552273 | 42.5783251 | 7.3515305 | 11.19676 | 1.034465 |
| ENSMUSG00000000031 | 13.18853 | 7.151932 | 1.115304 | 0.867429 | 0.000000 | 0.0000000 | 1.711944 | 0.000000 | 5.260747 | 0.8022308 | 0.000000 | 0.000000 | 0.000000 | 0.8489214 | 1.7106814 | 0.8603067 | 0.00000 | 0.000000 |
| ENSMUSG00000000037 | 0.00000 | 27.903213 | 6.897842 | 5.692254 | 1.901719 | 0.6549762 | 57.382077 | 0.000000 | 38.898261 | 8.9308534 | 6.966606 | 0.000000 | 7.939323 | 101.6481244 | 0.6500858 | 32.0575944 | 10.42322 | 0.000000 |
| ENSMUSG00000000049 | 30.86001 | 4.861367 | 51.466810 | 26.152649 | 1.968290 | 55.8319868 | 10.796186 | 2.747397 | 1.805883 | 0.9468345 | 26.954566 | 7.666032 | 32.235700 | 6.7001171 | 33.9132330 | 27.7647071 | 38.42445 | 15.903837 |
## Load transcriptome data normalized
fa_norm <- read.delim(file = local_fa_file_norm, sep = "\t", header = TRUE)
## Load same data with load_fix_row_names
fa_expr_norm <- load_fix_row_names(file = local_fa_file_norm, rownames.col = 1)
kable(head(fa_expr_norm), caption = "Loaded with myEasyLad() fa normalized")| day1_1 | day1_2 | day1_3 | day14_1 | day14_2 | day14_3 | day2_1 | day2_2 | day2_3 | day3_1 | day3_2 | day3_3 | day7_1 | day7_2 | day7_3 | normal_1 | normal_2 | normal_3 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSMUSG00000000001 | 2711.22417 | 1094.745589 | 1331.5880610 | 728.668808 | 603.846446 | 646.124076 | 1185.658698 | 1239.4695 | 829.596362 | 928.1069573 | 1110.908359 | 951.071644 | 953.722863 | 1008.0265279 | 1001.4667413 | 719.032852 | 1247.299971 | 809.375515 |
| ENSMUSG00000000003 | 0.00000 | 0.000000 | 0.0000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0000 | 0.000000 | 0.0000000 | 0.000000 | 0.000000 | 0.000000 | 0.0000000 | 0.0000000 | 0.000000 | 0.000000 | 0.000000 |
| ENSMUSG00000000028 | 42.82759 | 13.485108 | 21.9214582 | 16.244209 | 10.802262 | 33.839559 | 166.379146 | 23.6540 | 137.895374 | 48.8972257 | 8.244218 | 49.868925 | 6.091296 | 6.8109901 | 39.5074036 | 10.420766 | 7.448588 | 1.700369 |
| ENSMUSG00000000031 | 15.46552 | 4.290716 | 0.5620887 | 1.160301 | 0.000000 | 0.000000 | 1.105509 | 0.0000 | 5.560297 | 0.9403313 | 0.000000 | 0.000000 | 0.000000 | 0.9729986 | 1.8375537 | 1.488681 | 0.000000 | 0.000000 |
| ENSMUSG00000000037 | 0.00000 | 17.162865 | 3.9346207 | 6.961804 | 2.160452 | 1.057486 | 31.507014 | 0.0000 | 43.370319 | 8.4629814 | 7.213691 | 0.000000 | 6.961481 | 99.2458551 | 0.9187768 | 47.637787 | 6.771444 | 0.000000 |
| ENSMUSG00000000049 | 36.87931 | 3.064797 | 28.6665222 | 30.167817 | 2.160452 | 59.219228 | 6.080301 | 14.1924 | 2.224119 | 0.9403313 | 27.824235 | 5.699306 | 27.845923 | 6.8109901 | 31.2384121 | 41.683064 | 25.731487 | 27.205900 |
## Load proteome data
pfa_expr <- load_fix_row_names(file = local_pfa_file, rownames.col = 1)
kable(head(pfa_expr), caption = "Loaded with myEasyLad() pfa")| normal_1 | normal_2 | day1_1 | day1_2 | day2_1 | day2_2 | day7_1 | day7_2 | day14_1 | day14_2 | |
|---|---|---|---|---|---|---|---|---|---|---|
| ENSMUSG00000037686 | 531.2680 | 651.7200 | 335.5910 | 334.8460 | 197.1740 | 307.194 | 123.2060 | 272.6190 | 93.7247 | 196.1590 |
| ENSMUSG00000027831 | 221.6020 | 266.3590 | 175.4090 | 159.4190 | 234.8080 | 256.927 | 149.9380 | 315.0590 | 110.5880 | 126.3600 |
| ENSMUSG00000039201 | 26.0723 | 29.1331 | 57.7329 | 45.8475 | 81.6009 | 88.870 | 29.8560 | 44.5586 | 13.6292 | 27.6568 |
| ENSMUSG00000031095 | 4363.0500 | 4784.0800 | 4064.4800 | 3917.2900 | 4599.0300 | 5957.030 | 2806.5200 | 6792.0900 | 2022.5100 | 3226.7500 |
| ENSMUSG00000034931 | 879.2790 | 1065.3900 | 914.2870 | 928.2760 | 1000.1000 | 1264.270 | 738.3520 | 1362.0700 | 466.4440 | 714.5870 |
| ENSMUSG00000038208 | 68.2225 | 89.8871 | 57.9041 | 76.3510 | 84.8474 | 105.245 | 72.8696 | 138.9810 | 40.8101 | 59.2117 |
## Load proteome data normalized
pfa_expr_norm <- load_fix_row_names(file = local_pfa_file_norm, rownames.col = 1)
kable(head(pfa_expr_norm), caption = "Loaded with myEasyLad() pfa normalized")| normal_1 | normal_2 | day1_1 | day1_2 | day2_1 | day2_2 | day7_1 | day7_2 | day14_1 | day14_2 | |
|---|---|---|---|---|---|---|---|---|---|---|
| ENSMUSG00000037686 | 5.1206348 | 5.0327406 | 4.506491 | 4.492381 | 3.628542 | 3.932456 | 3.530692 | 3.685216 | 3.665305 | 4.026119 |
| ENSMUSG00000027831 | 3.8601167 | 3.7429066 | 3.571461 | 3.422865 | 3.880247 | 3.674968 | 3.813620 | 3.893704 | 3.903714 | 3.392438 |
| ENSMUSG00000039201 | 0.7849136 | 0.5645967 | 1.972258 | 1.630415 | 2.358439 | 2.146880 | 1.492104 | 1.081497 | 0.894504 | 1.208630 |
| ENSMUSG00000031095 | 8.1578622 | 7.9080330 | 8.103830 | 8.039705 | 8.170497 | 8.208356 | 8.038369 | 8.322345 | 8.095098 | 8.064739 |
| ENSMUSG00000034931 | 5.8472466 | 5.7415172 | 5.951775 | 5.962765 | 5.969607 | 5.972364 | 6.112220 | 6.004585 | 5.979016 | 5.890150 |
| ENSMUSG00000038208 | 2.1641269 | 2.1791889 | 1.976512 | 2.363160 | 2.414548 | 2.390031 | 2.774423 | 2.714986 | 2.468220 | 2.301407 |
## Load transcriptome metadata
fa_meta <- read.delim(file = trans_metadata_file, sep = "\t", header = TRUE)
kable(fa_meta, caption = "Metadata for the transcriptome dataset fa")| dataType | sampleName | condition | sampleNumber | color |
|---|---|---|---|---|
| 1 transcriptome | day14_1 | day14 | 1 | #FF4400 |
| 2 transcriptome | day14_2 | day14 | 2 | #FF4400 |
| 3 transcriptome | day14_3 | day14 | 3 | #FF4400 |
| 4 transcriptome | day1_1 | day1 | 1 | #BBD7FF |
| 5 transcriptome | day1_2 | day1 | 2 | #BBD7FF |
| 6 transcriptome | day1_3 | day1 | 3 | #BBD7FF |
| 7 transcriptome | day2_1 | day1 | 1 | #F0BBFF |
| 8 transcriptome | day2_2 | day1 | 2 | #F0BBFF |
| 9 transcriptome | day2_3 | day1 | 3 | #F0BBFF |
| 10 transcriptome | day3_1 | day3 | 1 | #FFFFDD |
| 11 transcriptome | day3_2 | day3 | 2 | #FFFFDD |
| 12 transcriptome | day3_3 | day3 | 3 | #FFFFDD |
| 13 transcriptome | day7_1 | day7 | 1 | #FFDD88 |
| 14 transcriptome | day7_2 | day7 | 2 | #FFDD88 |
| 15 transcriptome | day7_3 | day7 | 3 | #FFDD88 |
| 16 transcriptome | normal_1 | normal | 1 | #BBFFBB |
| 17 transcriptome | normal_2 | normal | 2 | #BBFFBB |
| 18 transcriptome | normal_3 | normal | 3 | #BBFFBB |
## Load proteome metadata
pfa_meta <- read.delim(file = prot_metadata_file, sep = "\t", header = TRUE)
kable(pfa_meta, caption = "Metadata for the proteome dataset pfa")| dataType | sampleName | condition | sampleNumber | color |
|---|---|---|---|---|
| proteome | normal_1 | normal | 1 | #BBFFBB |
| proteome | normal_2 | normal | 2 | #BBFFBB |
| proteome | day1_1 | day1 | 1 | #FFFFDD |
| proteome | day1_2 | day1 | 2 | #FFFFDD |
| proteome | day2_1 | day2 | 1 | #BBD7FF |
| proteome | day2_2 | day2 | 1 | #BBD7FF |
| proteome | day3_1 | day3 | 1 | #F0BBFF |
| proteome | day3_2 | day3 | 2 | #F0BBFF |
| proteome | day7_1 | day7 | 1 | #FFDD88 |
| proteome | day7_2 | day7 | 2 | #FFDD88 |
| proteome | day14_1 | day14 | 1 | #FF4400 |
| proteome | day14_2 | day14 | 2 | #FF4400 |
Structure de chaque dataframe.
str(fa_expr)'data.frame': 46679 obs. of 18 variables:
$ day1_1 : num 2278.8 0 36.3 13.2 0 ...
$ day1_2 : num 1786.5 0 22.15 7.15 27.9 ...
$ day1_3 : num 2368.62 0 39.48 1.12 6.9 ...
$ day14_1 : num 627.758 0 14.471 0.867 5.692 ...
$ day14_2 : num 559.2 0 10.2 0 1.9 ...
$ day14_3 : num 611.434 0 31.691 0 0.655 ...
$ day2_1 : num 2145.22 0 300.56 1.71 57.38 ...
$ day2_2 : num 262.45 0 4.77 0 0 ...
$ day2_3 : num 745.84 0 123.9 5.26 38.9 ...
$ day3_1 : num 987.185 0 51.856 0.802 8.931 ...
$ day3_2 : num 1077.65 0 8.43 0 6.97 ...
$ day3_3 : num 1335.1 0 69.9 0 0 ...
$ day7_1 : num 1096.08 0 6.67 0 7.94 ...
$ day7_2 : num 1035.846 0 6.955 0.849 101.648 ...
$ day7_3 : num 1090.04 0 42.58 1.71 0.65 ...
$ normal_1: num 483.23 0 7.35 0.86 32.06 ...
$ normal_2: num 1842.1 0 11.2 0 10.4 ...
$ normal_3: num 475.7 0 1.03 0 0 ...
str(fa_expr_norm)'data.frame': 46679 obs. of 18 variables:
$ day1_1 : num 2711.2 0 42.8 15.5 0 ...
$ day1_2 : num 1094.75 0 13.49 4.29 17.16 ...
$ day1_3 : num 1331.588 0 21.921 0.562 3.935 ...
$ day14_1 : num 728.67 0 16.24 1.16 6.96 ...
$ day14_2 : num 603.85 0 10.8 0 2.16 ...
$ day14_3 : num 646.12 0 33.84 0 1.06 ...
$ day2_1 : num 1185.66 0 166.38 1.11 31.51 ...
$ day2_2 : num 1239.5 0 23.7 0 0 ...
$ day2_3 : num 829.6 0 137.9 5.56 43.37 ...
$ day3_1 : num 928.11 0 48.9 0.94 8.46 ...
$ day3_2 : num 1110.91 0 8.24 0 7.21 ...
$ day3_3 : num 951.1 0 49.9 0 0 ...
$ day7_1 : num 953.72 0 6.09 0 6.96 ...
$ day7_2 : num 1008.027 0 6.811 0.973 99.246 ...
$ day7_3 : num 1001.467 0 39.507 1.838 0.919 ...
$ normal_1: num 719.03 0 10.42 1.49 47.64 ...
$ normal_2: num 1247.3 0 7.45 0 6.77 ...
$ normal_3: num 809.4 0 1.7 0 0 ...
str(fa_meta)'data.frame': 18 obs. of 5 variables:
$ dataType : chr "1 transcriptome" "2 transcriptome" "3 transcriptome" "4 transcriptome" ...
$ sampleName : chr "day14_1" "day14_2" "day14_3" "day1_1" ...
$ condition : chr "day14" "day14" "day14" "day1" ...
$ sampleNumber: int 1 2 3 1 2 3 1 2 3 1 ...
$ color : chr "#FF4400" "#FF4400" "#FF4400" "#BBD7FF" ...
str(pfa_expr)'data.frame': 8044 obs. of 10 variables:
$ normal_1: num 531.3 221.6 26.1 4363.1 879.3 ...
$ normal_2: num 651.7 266.4 29.1 4784.1 1065.4 ...
$ day1_1 : num 335.6 175.4 57.7 4064.5 914.3 ...
$ day1_2 : num 334.8 159.4 45.8 3917.3 928.3 ...
$ day2_1 : num 197.2 234.8 81.6 4599 1000.1 ...
$ day2_2 : num 307.2 256.9 88.9 5957 1264.3 ...
$ day7_1 : num 123.2 149.9 29.9 2806.5 738.4 ...
$ day7_2 : num 272.6 315.1 44.6 6792.1 1362.1 ...
$ day14_1 : num 93.7 110.6 13.6 2022.5 466.4 ...
$ day14_2 : num 196.2 126.4 27.7 3226.8 714.6 ...
str(pfa_expr_norm)'data.frame': 8044 obs. of 10 variables:
$ normal_1: num 5.121 3.86 0.785 8.158 5.847 ...
$ normal_2: num 5.033 3.743 0.565 7.908 5.742 ...
$ day1_1 : num 4.51 3.57 1.97 8.1 5.95 ...
$ day1_2 : num 4.49 3.42 1.63 8.04 5.96 ...
$ day2_1 : num 3.63 3.88 2.36 8.17 5.97 ...
$ day2_2 : num 3.93 3.67 2.15 8.21 5.97 ...
$ day7_1 : num 3.53 3.81 1.49 8.04 6.11 ...
$ day7_2 : num 3.69 3.89 1.08 8.32 6 ...
$ day14_1 : num 3.665 3.904 0.895 8.095 5.979 ...
$ day14_2 : num 4.03 3.39 1.21 8.06 5.89 ...
str(pfa_meta)'data.frame': 12 obs. of 5 variables:
$ dataType : chr "proteome" "proteome" "proteome" "proteome" ...
$ sampleName : chr "normal_1" "normal_2" "day1_1" "day1_2" ...
$ condition : chr "normal" "normal" "day1" "day1" ...
$ sampleNumber: int 1 2 1 2 1 1 1 2 1 2 ...
$ color : chr "#BBFFBB" "#BBFFBB" "#FFFFDD" "#FFFFDD" ...
Je supprime le groupe day3 du fichier metadata pfa
pfa_meta <- pfa_meta %>% filter(condition != "day3")
str(pfa_meta)'data.frame': 10 obs. of 5 variables:
$ dataType : chr "proteome" "proteome" "proteome" "proteome" ...
$ sampleName : chr "normal_1" "normal_2" "day1_1" "day1_2" ...
$ condition : chr "normal" "normal" "day1" "day1" ...
$ sampleNumber: int 1 2 1 2 1 1 1 2 1 2
$ color : chr "#BBFFBB" "#BBFFBB" "#FFFFDD" "#FFFFDD" ...
Les deux fichiers fa ne donnent pas les observations de l’échantillon dans le même ordre:
fa_meta$sampleName == names(fa_expr) [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Donc je réorganise les échantillons dans l’ordre de l’expérience: condition normale, puis day 1 à 14 avec les 3 réplicats.
sample_order <- c(paste(rep(c("normal", "day1", "day2", "day3", "day7", "day14"), each = 3),
1:3, sep = "_"))
fa_expr <- fa_expr[,sample_order]
fa_meta <- fa_meta[match(sample_order, fa_meta$sampleName),]
fa_expr_norm <- fa_expr_norm[,sample_order]
fa_meta <- fa_meta[match(sample_order, fa_meta$sampleName),]J’ai maintenant les deux jeux de données avec pour chaque un fichier metadata et une table de counts raw ou normalized:
fa_expr
fa_expr_norm
fa_meta
pfa_expr
pfa_expr_norm
pfa_meta
head(fa_expr) normal_1 normal_2 normal_3 day1_1 day1_2 day1_3 day2_1 day2_2 day2_3 day3_1 day3_2 day3_3 day7_1 day7_2 day7_3 day14_1 day14_2 day14_3
ENSMUSG00000000001 483.2298191 1842.14841 475.696800 2278.80022 1786.498848 2368.618959 2145.223455 262.454849 745.843632 987.1850529 1077.645232 1335.116771 1096.075988 1035.8458456 1090.0375905 627.758017 559.156031 611.4338605
ENSMUSG00000000003 0.0000000 0.00000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000000 0.000000 0.000000 0.000000 0.0000000 0.0000000 0.000000 0.000000 0.0000000
ENSMUSG00000000028 7.3515305 11.19676 1.034465 36.27547 22.147861 39.484949 300.558779 4.771672 123.896184 51.8555945 8.434511 69.936866 6.665195 6.9552273 42.5783251 14.470759 10.167813 31.6910193
ENSMUSG00000000031 0.8603067 0.00000 0.000000 13.18853 7.151932 1.115304 1.711944 0.000000 5.260747 0.8022308 0.000000 0.000000 0.000000 0.8489214 1.7106814 0.867429 0.000000 0.0000000
ENSMUSG00000000037 32.0575944 10.42322 0.000000 0.00000 27.903214 6.897842 57.382077 0.000000 38.898261 8.9308534 6.966606 0.000000 7.939323 101.6481244 0.6500858 5.692254 1.901719 0.6549762
ENSMUSG00000000049 27.7647071 38.42445 15.903837 30.86001 4.861367 51.466810 10.796186 2.747397 1.805883 0.9468345 26.954566 7.666032 32.235700 6.7001171 33.9132330 26.152649 1.968290 55.8319868
head(fa_expr_norm) normal_1 normal_2 normal_3 day1_1 day1_2 day1_3 day2_1 day2_2 day2_3 day3_1 day3_2 day3_3 day7_1 day7_2 day7_3 day14_1 day14_2 day14_3
ENSMUSG00000000001 719.032852 1247.299971 809.375515 2711.22417 1094.745589 1331.5880610 1185.658698 1239.4695 829.596362 928.1069573 1110.908359 951.071644 953.722863 1008.0265279 1001.4667413 728.668808 603.846446 646.124076
ENSMUSG00000000003 0.000000 0.000000 0.000000 0.00000 0.000000 0.0000000 0.000000 0.0000 0.000000 0.0000000 0.000000 0.000000 0.000000 0.0000000 0.0000000 0.000000 0.000000 0.000000
ENSMUSG00000000028 10.420766 7.448588 1.700369 42.82759 13.485108 21.9214582 166.379146 23.6540 137.895374 48.8972257 8.244218 49.868925 6.091296 6.8109901 39.5074036 16.244209 10.802262 33.839559
ENSMUSG00000000031 1.488681 0.000000 0.000000 15.46552 4.290716 0.5620887 1.105509 0.0000 5.560297 0.9403313 0.000000 0.000000 0.000000 0.9729986 1.8375537 1.160301 0.000000 0.000000
ENSMUSG00000000037 47.637787 6.771444 0.000000 0.00000 17.162865 3.9346207 31.507014 0.0000 43.370319 8.4629814 7.213691 0.000000 6.961481 99.2458551 0.9187768 6.961804 2.160452 1.057486
ENSMUSG00000000049 41.683064 25.731487 27.205900 36.87931 3.064797 28.6665222 6.080301 14.1924 2.224119 0.9403313 27.824235 5.699306 27.845923 6.8109901 31.2384121 30.167817 2.160452 59.219228
fa_meta dataType sampleName condition sampleNumber color
16 16 transcriptome normal_1 normal 1 #BBFFBB
17 17 transcriptome normal_2 normal 2 #BBFFBB
18 18 transcriptome normal_3 normal 3 #BBFFBB
4 4 transcriptome day1_1 day1 1 #BBD7FF
5 5 transcriptome day1_2 day1 2 #BBD7FF
6 6 transcriptome day1_3 day1 3 #BBD7FF
7 7 transcriptome day2_1 day1 1 #F0BBFF
8 8 transcriptome day2_2 day1 2 #F0BBFF
9 9 transcriptome day2_3 day1 3 #F0BBFF
10 10 transcriptome day3_1 day3 1 #FFFFDD
11 11 transcriptome day3_2 day3 2 #FFFFDD
12 12 transcriptome day3_3 day3 3 #FFFFDD
13 13 transcriptome day7_1 day7 1 #FFDD88
14 14 transcriptome day7_2 day7 2 #FFDD88
15 15 transcriptome day7_3 day7 3 #FFDD88
1 1 transcriptome day14_1 day14 1 #FF4400
2 2 transcriptome day14_2 day14 2 #FF4400
3 3 transcriptome day14_3 day14 3 #FF4400
head(pfa_expr) normal_1 normal_2 day1_1 day1_2 day2_1 day2_2 day7_1 day7_2 day14_1 day14_2
ENSMUSG00000037686 531.2680 651.7200 335.5910 334.8460 197.1740 307.194 123.2060 272.6190 93.7247 196.1590
ENSMUSG00000027831 221.6020 266.3590 175.4090 159.4190 234.8080 256.927 149.9380 315.0590 110.5880 126.3600
ENSMUSG00000039201 26.0723 29.1331 57.7329 45.8475 81.6009 88.870 29.8560 44.5586 13.6292 27.6568
ENSMUSG00000031095 4363.0500 4784.0800 4064.4800 3917.2900 4599.0300 5957.030 2806.5200 6792.0900 2022.5100 3226.7500
ENSMUSG00000034931 879.2790 1065.3900 914.2870 928.2760 1000.1000 1264.270 738.3520 1362.0700 466.4440 714.5870
ENSMUSG00000038208 68.2225 89.8871 57.9041 76.3510 84.8474 105.245 72.8696 138.9810 40.8101 59.2117
head(pfa_expr_norm) normal_1 normal_2 day1_1 day1_2 day2_1 day2_2 day7_1 day7_2 day14_1 day14_2
ENSMUSG00000037686 5.1206348 5.0327406 4.506491 4.492382 3.628542 3.932456 3.530692 3.685216 3.665305 4.026119
ENSMUSG00000027831 3.8601167 3.7429066 3.571461 3.422865 3.880248 3.674968 3.813620 3.893704 3.903714 3.392438
ENSMUSG00000039201 0.7849136 0.5645967 1.972258 1.630415 2.358439 2.146880 1.492104 1.081497 0.894504 1.208630
ENSMUSG00000031095 8.1578622 7.9080330 8.103830 8.039705 8.170497 8.208356 8.038369 8.322345 8.095098 8.064739
ENSMUSG00000034931 5.8472466 5.7415172 5.951775 5.962765 5.969607 5.972364 6.112220 6.004585 5.979016 5.890150
ENSMUSG00000038208 2.1641269 2.1791889 1.976512 2.363160 2.414548 2.390031 2.774423 2.714986 2.468220 2.301407
pfa_meta dataType sampleName condition sampleNumber color
1 proteome normal_1 normal 1 #BBFFBB
2 proteome normal_2 normal 2 #BBFFBB
3 proteome day1_1 day1 1 #FFFFDD
4 proteome day1_2 day1 2 #FFFFDD
5 proteome day2_1 day2 1 #BBD7FF
6 proteome day2_2 day2 1 #BBD7FF
9 proteome day7_1 day7 1 #FFDD88
10 proteome day7_2 day7 2 #FFDD88
11 proteome day14_1 day14 1 #FF4400
12 proteome day14_2 day14 2 #FF4400
#?DESeq
library("DESeq2")Préparation objet DESeq2
# id <- pfa_expr [,1]
# pfa_expr <- pfa_expr[,-1]
# rownames(pfa_expr) <- make.names(id, unique = TRUE)
#any (fa_DataMatrix [,-1] < 0)
fa_DataMatrix <- as.matrix(fa_expr)
dds <- DESeqDataSetFromMatrix(countData = round(fa_DataMatrix), colData = fa_meta, design = ~ condition)
ddsclass: DESeqDataSet
dim: 46679 18
metadata(1): version
assays(1): counts
rownames(46679): ENSMUSG00000000001 ENSMUSG00000000003 ... ENSMUSG00000109577 ENSMUSG00000109578
rowData names(0):
colnames(18): normal_1 normal_2 ... day14_2 day14_3
colData names(5): dataType sampleName condition sampleNumber color
Run fonction DESeq2
dds <- DESeq(dds)Table de résultats du DESeq2 entre les groupes normal et day7
res <- results(dds, contrast=c("condition", "normal", "day7"))
head(res)log2 fold change (MLE): condition normal vs day7
Wald test p-value: condition normal vs day7
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000000001 1061.10739 -0.0926767 0.371089 -0.249742 0.802787 0.936171
ENSMUSG00000000003 0.00000 NA NA NA NA NA
ENSMUSG00000000028 35.89103 -1.3988787 1.037433 -1.348404 0.177528 0.509671
ENSMUSG00000000031 1.85467 -1.0447828 2.258700 -0.462559 0.643680 NA
ENSMUSG00000000037 15.74259 -0.9804598 1.674099 -0.585664 0.558101 0.828730
ENSMUSG00000000049 20.97970 0.5186876 1.122010 0.462284 0.643877 0.872279
Combien de gènes sont significatifs > 0.05
table(res$padj < 0.05)
FALSE TRUE
20701 1764
Je classe la table par ordre décroissant de padj
orderedRes <- res[ order(res$padj), ]
head(orderedRes)log2 fold change (MLE): condition normal vs day7
Wald test p-value: condition normal vs day7
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000024164 14670.159 -8.19459 0.593794 -13.80039 2.53467e-43 5.69414e-39
ENSMUSG00000029304 224391.951 -5.14937 0.480753 -10.71105 9.03342e-27 9.62642e-23
ENSMUSG00000061947 621.939 -4.85093 0.454278 -10.67834 1.28552e-26 9.62642e-23
ENSMUSG00000027962 2261.581 -5.22527 0.503833 -10.37103 3.35873e-25 1.88635e-21
ENSMUSG00000003617 5729.876 -3.80553 0.383311 -9.92805 3.14337e-23 1.41232e-19
ENSMUSG00000066071 3575.949 2.88208 0.301427 9.56143 1.16138e-21 4.34841e-18
Je normalise la table
normCounts <- counts(dds, normalized = TRUE)
head(normCounts) normal_1 normal_2 normal_3 day1_1 day1_2 day1_3 day2_1 day2_2 day2_3 day3_1 day3_2 day3_3 day7_1 day7_2 day7_3 day14_1 day14_2 day14_3
ENSMUSG00000000001 719.032852 1247.299971 809.375515 2711.22417 1094.745589 1331.5880610 1185.658698 1239.4695 829.596362 928.1069573 1110.908359 951.071644 953.722863 1008.0265279 1001.4667413 728.668808 603.846446 646.124076
ENSMUSG00000000003 0.000000 0.000000 0.000000 0.00000 0.000000 0.0000000 0.000000 0.0000 0.000000 0.0000000 0.000000 0.000000 0.000000 0.0000000 0.0000000 0.000000 0.000000 0.000000
ENSMUSG00000000028 10.420766 7.448588 1.700369 42.82759 13.485108 21.9214582 166.379146 23.6540 137.895374 48.8972257 8.244218 49.868925 6.091296 6.8109901 39.5074036 16.244209 10.802262 33.839559
ENSMUSG00000000031 1.488681 0.000000 0.000000 15.46552 4.290716 0.5620887 1.105509 0.0000 5.560297 0.9403313 0.000000 0.000000 0.000000 0.9729986 1.8375537 1.160301 0.000000 0.000000
ENSMUSG00000000037 47.637787 6.771444 0.000000 0.00000 17.162865 3.9346207 31.507014 0.0000 43.370319 8.4629814 7.213691 0.000000 6.961481 99.2458551 0.9187768 6.961804 2.160452 1.057486
ENSMUSG00000000049 41.683064 25.731487 27.205900 36.87931 3.064797 28.6665222 6.080301 14.1924 2.224119 0.9403313 27.824235 5.699306 27.845923 6.8109901 31.2384121 30.167817 2.160452 59.219228
Visualisation des estimations de dispersion de DESeq2
plotDispEsts(dds)Les points noirs sont la représentation “brute” des gènes (forte variabilité)
La ligne de tendance rouge (dépendance des dispersions à la moyenne)
Les points bleus représentent l’estimation de chaque gène vers la ligne rouge
Les cercles bleus au-dessus du «nuage» principal représentent des gènes avec des valeurs aberrantes de dispersion.
Distribution des pvalues
hist(orderedRes$pvalue, breaks=0:50/50, xlab="p value", main="Histogram of nominal p values")Heatmap des 20 gènes les plus différentiellement exprimés.
library(pheatmap)
# select the 20 most differentially expressed genes
select <- row.names(orderedRes[1:20, ])
# transform the counts to log10
log10_normCounts <- log10(normCounts + 1)
# get the values for the selected genes
values <- log10_normCounts[ select, ]
pheatmap(values,
scale = "none",
cluster_rows = FALSE,
cluster_cols = FALSE,
fontsize_row = 8,
annotation_names_col = FALSE,
#gaps_col = c(3,6),
display_numbers = TRUE,
number_format = "%.2f",
height=12,
width=6)Préparation objet DESeq2
# id <- pfa_expr [,1]
# pfa_expr <- pfa_expr[,-1]
# rownames(pfa_expr) <- make.names(id, unique = TRUE)
#any (fa_DataMatrix [,-1] < 0)
pfa_DataMatrix <- as.matrix(pfa_expr)
dds <- DESeqDataSetFromMatrix(countData = round(pfa_DataMatrix), colData = pfa_meta, design = ~ condition)
ddsclass: DESeqDataSet
dim: 8044 10
metadata(1): version
assays(1): counts
rownames(8044): ENSMUSG00000037686 ENSMUSG00000027831 ... ENSMUSG00000027523.1 ENSMUSG00000020741.1
rowData names(0):
colnames(10): 1 2 ... 11 12
colData names(5): dataType sampleName condition sampleNumber color
Run fonction DESeq2
dds <- DESeq(dds)Table de résultats du DESeq2 entre les groupes normal et day7
res <- results(dds, contrast=c("condition", "normal", "day7"))
head(res)log2 fold change (MLE): condition normal vs day7
Wald test p-value: condition normal vs day7
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000037686 283.1512 1.4928568 0.191451 7.797601 0.00000000000000630949 0.0000000000000637607
ENSMUSG00000027831 193.2636 -0.0228946 0.217834 -0.105101 0.91629551187216284891 0.9370443122182423590
ENSMUSG00000039201 42.2126 -0.5914299 0.411140 -1.438513 0.15028861831364714874 0.2159559924464054259
ENSMUSG00000031095 4050.7644 -0.1184521 0.129794 -0.912619 0.36144292218839429998 0.4507669559819292848
ENSMUSG00000034931 900.1919 -0.2292681 0.104342 -2.197286 0.02800003925436490848 0.0504100577792302648
ENSMUSG00000038208 76.4645 -0.5416567 0.276803 -1.956830 0.05036751097988109716 0.0836245526321744842
Combien de gènes sont significatifs > 0.05
table(res$padj < 0.05)
FALSE TRUE
3582 4462
Je classe la table par ordre décroissant de padj
orderedRes <- res[ order(res$padj), ]
head(orderedRes)log2 fold change (MLE): condition normal vs day7
Wald test p-value: condition normal vs day7
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000071715 1097.288 -2.96982 0.157649 -18.8382 3.67186e-79 1.47682e-75
ENSMUSG00000024757 963.642 2.91933 0.154778 18.8614 2.36703e-79 1.47682e-75
ENSMUSG00000053303 626.809 3.02825 0.163434 18.5290 1.20598e-76 3.23363e-73
ENSMUSG00000039519 2155.892 2.77257 0.153107 18.1087 2.72021e-73 5.47034e-70
ENSMUSG00000066097 932.324 2.61823 0.145835 17.9534 4.51718e-72 7.26724e-69
ENSMUSG00000032047 13304.389 1.93889 0.109767 17.6637 7.98453e-70 1.07046e-66
Je normalise la table
normCounts <- counts(dds, normalized = TRUE)
head(normCounts) 1 2 3 4 5 6 9 10 11 12
ENSMUSG00000037686 510.05841 481.36921 327.9377 325.17673 181.56837 226.98770 165.17223 185.04097 187.81855 240.38167
ENSMUSG00000027831 213.24476 196.38683 170.8009 154.33761 216.59171 190.01902 201.42954 213.50881 221.78573 154.53107
ENSMUSG00000039201 24.97461 21.41059 56.6083 44.65113 75.57668 65.80425 40.28591 30.50126 27.97298 34.34024
ENSMUSG00000031095 4190.93192 3532.00967 3966.4851 3802.14103 4238.74590 4404.44867 3769.41820 4603.65661 4042.09496 3957.71242
ENSMUSG00000034931 844.33398 786.28560 892.0687 900.78807 921.66686 934.56826 991.03336 923.17142 931.10047 876.90250
ENSMUSG00000038208 65.31821 66.44667 56.6083 73.77144 78.34168 77.63423 98.02904 94.21500 81.92086 72.35979
Visualisation des estimations de dispersion de DESeq2
plotDispEsts(dds)Distribution des pvalues
hist(orderedRes$pvalue, breaks=0:50/50, xlab="p value", main="Histogram of nominal p values")Heatmap des 20 gènes les plus différentiellement exprimés.
# select the 20 most differentially expressed genes
select <- row.names(orderedRes[1:20, ])
# transform the counts to log10
log10_normCounts <- log10(normCounts + 1)
# get the values for the selected genes
values <- log10_normCounts[ select, ]
pheatmap(values,
scale = "none",
cluster_rows = FALSE,
cluster_cols = FALSE,
fontsize_row = 8,
annotation_names_col = FALSE,
#gaps_col = c(3,6),
display_numbers = TRUE,
number_format = "%.2f",
height=12,
width=6)Il y a 18 échantillons pour l’expérience fa et 10 pour l’expérience pfa, il faut donc supprimer les 6 échantillons supplémentaire de la table fa.
Il faut également uniformiser la table metadata en conséquence.
head(fa_expr_norm) normal_1 normal_2 normal_3 day1_1 day1_2 day1_3 day2_1 day2_2 day2_3 day3_1 day3_2 day3_3 day7_1 day7_2 day7_3 day14_1 day14_2 day14_3
ENSMUSG00000000001 719.032852 1247.299971 809.375515 2711.22417 1094.745589 1331.5880610 1185.658698 1239.4695 829.596362 928.1069573 1110.908359 951.071644 953.722863 1008.0265279 1001.4667413 728.668808 603.846446 646.124076
ENSMUSG00000000003 0.000000 0.000000 0.000000 0.00000 0.000000 0.0000000 0.000000 0.0000 0.000000 0.0000000 0.000000 0.000000 0.000000 0.0000000 0.0000000 0.000000 0.000000 0.000000
ENSMUSG00000000028 10.420766 7.448588 1.700369 42.82759 13.485108 21.9214582 166.379146 23.6540 137.895374 48.8972257 8.244218 49.868925 6.091296 6.8109901 39.5074036 16.244209 10.802262 33.839559
ENSMUSG00000000031 1.488681 0.000000 0.000000 15.46552 4.290716 0.5620887 1.105509 0.0000 5.560297 0.9403313 0.000000 0.000000 0.000000 0.9729986 1.8375537 1.160301 0.000000 0.000000
ENSMUSG00000000037 47.637787 6.771444 0.000000 0.00000 17.162865 3.9346207 31.507014 0.0000 43.370319 8.4629814 7.213691 0.000000 6.961481 99.2458551 0.9187768 6.961804 2.160452 1.057486
ENSMUSG00000000049 41.683064 25.731487 27.205900 36.87931 3.064797 28.6665222 6.080301 14.1924 2.224119 0.9403313 27.824235 5.699306 27.845923 6.8109901 31.2384121 30.167817 2.160452 59.219228
fa <- fa_expr_norm[,- c(3,6,9,10,11,12,15,18)]
head(fa) normal_1 normal_2 day1_1 day1_2 day2_1 day2_2 day7_1 day7_2 day14_1 day14_2
ENSMUSG00000000001 719.032852 1247.299971 2711.22417 1094.745589 1185.658698 1239.4695 953.722863 1008.0265279 728.668808 603.846446
ENSMUSG00000000003 0.000000 0.000000 0.00000 0.000000 0.000000 0.0000 0.000000 0.0000000 0.000000 0.000000
ENSMUSG00000000028 10.420766 7.448588 42.82759 13.485108 166.379146 23.6540 6.091296 6.8109901 16.244209 10.802262
ENSMUSG00000000031 1.488681 0.000000 15.46552 4.290716 1.105509 0.0000 0.000000 0.9729986 1.160301 0.000000
ENSMUSG00000000037 47.637787 6.771444 0.00000 17.162865 31.507014 0.0000 6.961481 99.2458551 6.961804 2.160452
ENSMUSG00000000049 41.683064 25.731487 36.87931 3.064797 6.080301 14.1924 27.845923 6.8109901 30.167817 2.160452
pfa <- pfa_expr_norm
head(pfa) normal_1 normal_2 day1_1 day1_2 day2_1 day2_2 day7_1 day7_2 day14_1 day14_2
ENSMUSG00000037686 5.1206348 5.0327406 4.506491 4.492382 3.628542 3.932456 3.530692 3.685216 3.665305 4.026119
ENSMUSG00000027831 3.8601167 3.7429066 3.571461 3.422865 3.880248 3.674968 3.813620 3.893704 3.903714 3.392438
ENSMUSG00000039201 0.7849136 0.5645967 1.972258 1.630415 2.358439 2.146880 1.492104 1.081497 0.894504 1.208630
ENSMUSG00000031095 8.1578622 7.9080330 8.103830 8.039705 8.170497 8.208356 8.038369 8.322345 8.095098 8.064739
ENSMUSG00000034931 5.8472466 5.7415172 5.951775 5.962765 5.969607 5.972364 6.112220 6.004585 5.979016 5.890150
ENSMUSG00000038208 2.1641269 2.1791889 1.976512 2.363160 2.414548 2.390031 2.774423 2.714986 2.468220 2.301407
metadata <- pfa_meta[,- 1]
metadata sampleName condition sampleNumber color
1 normal_1 normal 1 #BBFFBB
2 normal_2 normal 2 #BBFFBB
3 day1_1 day1 1 #FFFFDD
4 day1_2 day1 2 #FFFFDD
5 day2_1 day2 1 #BBD7FF
6 day2_2 day2 1 #BBD7FF
9 day7_1 day7 1 #FFDD88
10 day7_2 day7 2 #FFDD88
11 day14_1 day14 1 #FF4400
12 day14_2 day14 2 #FF4400
#c(normal_1,normal_2,day1_1,day1_2,day2_1,day2_2,day7_1,day7_2,day14_1,day14_2)
## Classer suivant ordre colonne Name de metadata
#Je retire le rawnames original (les chiffres 1,2,3...)
rownames(metadata) <- NULL
metadata sampleName condition sampleNumber color
1 normal_1 normal 1 #BBFFBB
2 normal_2 normal 2 #BBFFBB
3 day1_1 day1 1 #FFFFDD
4 day1_2 day1 2 #FFFFDD
5 day2_1 day2 1 #BBD7FF
6 day2_2 day2 1 #BBD7FF
7 day7_1 day7 1 #FFDD88
8 day7_2 day7 2 #FFDD88
9 day14_1 day14 1 #FF4400
10 day14_2 day14 2 #FF4400
# add the rownames as a proper column
library(tibble)
metadata <- column_to_rownames(metadata, var = "sampleName")
# ## Ordre des echantillons de data table idem metadata table
# fa <- fa[, row.names(metadata)]
# head(fa)
#
# pfa_expr_norm <- pfa_expr_norm[, row.names(metadata)]
# head(pfa_expr_norm)
#
# metadata
rownames(metadata) [1] "normal_1" "normal_2" "day1_1" "day1_2" "day2_1" "day2_2" "day7_1" "day7_2" "day14_1" "day14_2"
colnames(fa) [1] "normal_1" "normal_2" "day1_1" "day1_2" "day2_1" "day2_2" "day7_1" "day7_2" "day14_1" "day14_2"
colnames(pfa) [1] "normal_1" "normal_2" "day1_1" "day1_2" "day2_1" "day2_2" "day7_1" "day7_2" "day14_1" "day14_2"
Je regarde combien il y a de lignes avec des zéros dans la table fa et je les supprime et je filtre les genes où il y au moins de 3 échantillons avec des counts supérieurs ou égaux à 5
dim(fa)[1] 46679 10
#voir le nombre de lignes avec zero counts
rs <- rowSums(fa)
nbgenes_at_zeros <- length(which(rs==0))
nbgenes_at_zeros[1] 13269
#Je supprime les lignes avec zero counts
fa <- fa[rowSums(fa[, -1])>0, ]
#Je ne garde que les genes où il y a moins de 3 échantillons avec des counts supérieurs ou égaux à 5.
fa <- fa[rowSums((fa[, -1])>=5)>= 3 , ]
dim(fa)[1] 22459 10
Idem pour la table pfa
dim(pfa)[1] 8044 10
rs <- rowSums(pfa)
nbgenes_at_zeros <- length(which(rs==0))
nbgenes_at_zeros[1] 0
pfa <- pfa[rowSums((pfa[, -1])>=5)>= 3 , ]
dim(pfa)[1] 5277 10
Il faut également transposer les counts tables
fa_t <- t(fa)
str(fa_t) num [1:10, 1:22459] 719 1247 2711 1095 1186 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10] "normal_1" "normal_2" "day1_1" "day1_2" ...
..$ : chr [1:22459] "ENSMUSG00000000001" "ENSMUSG00000000028" "ENSMUSG00000000037" "ENSMUSG00000000049" ...
dim(fa_t)[1] 10 22459
class(fa_t)[1] "matrix" "array"
pfa_t <- t(pfa_expr_norm)
str(pfa_t) num [1:10, 1:8044] 5.12 5.03 4.51 4.49 3.63 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10] "normal_1" "normal_2" "day1_1" "day1_2" ...
..$ : chr [1:8044] "ENSMUSG00000037686" "ENSMUSG00000027831" "ENSMUSG00000039201" "ENSMUSG00000031095" ...
dim(pfa_t)[1] 10 8044
class(pfa_t)[1] "matrix" "array"
dim(metadata)[1] 10 3
J’ai donc maintenant les deux counts tables avec 10 échantillons
library(mixOmics)spca.fa_t <- spca(fa_t, ncomp=3, keepX=c(10,10,10))
spca.pfa_t <- spca(pfa_t, ncomp=3, keepX=c(10,10,10))
#?spca
plotIndiv(spca.fa_t, group = metadata$condition, comp= c(1,2), ind.names = FALSE, legend=TRUE,title = "Sparse PCA PlotIndiv PC1 & PC2 fa")plotIndiv(spca.pfa_t, group = metadata$condition, comp= c(1,2), ind.names = FALSE, legend=TRUE,title = "Sparse PCA PlotIndiv PC1 & PC2 Pfa")plotVar(spca.fa_t, var.names = FALSE, comp= c(1,2),title = "PlotVar PC1 & PC2 fa")plotVar(spca.pfa_t, var.names = FALSE, comp= c(1,2),title = "PlotVar PC1 & PC2 pfa")Analyses multivariées supervisées
W <- metadata$condition
plsda.fa_t <- mixOmics::plsda(fa_t, W, ncomp = 3)
plsda.pfa_t <- mixOmics::plsda(pfa_t, W, ncomp = 3)
#Error: could not find function "plsda" donc rajouter mixOmics::
mixOmics::plotIndiv(plsda.fa_t, comp= c(1,2), ind.names=FALSE, legend=TRUE,title = "PLS-DA PlotIndiv PC1 & PC2 fa")mixOmics::plotIndiv(plsda.pfa_t, comp= c(1,2), ind.names=FALSE, legend=TRUE,title = "PLS-DA PlotIndiv PC1 & PC2 fa")mixOmics::plotLoadings(plsda.fa_t, comp=1, contrib = "max")mixOmics::plotLoadings(plsda.fa_t, comp=2, contrib = "max")mixOmics::plotLoadings(plsda.pfa_t, comp=1, contrib = "max")mixOmics::plotLoadings(plsda.pfa_t, comp=2, contrib = "max")splsda.fa_t <- splsda(fa_t, W, ncomp = 3, keepX = c(10,10,10))
splsda.pfa_t <- splsda(pfa_t, W, ncomp = 3, keepX = c(10,10,10))
plotIndiv(splsda.fa_t, ind.names=FALSE, legend=TRUE,comp = c(1,2),title = "Sparse PLS-DA PlotIndiv PC1 & PC2 fa")plotIndiv(splsda.pfa_t, ind.names=FALSE, legend=TRUE,comp = c(1,2),title = "Sparse PLS-DA PlotIndiv PC1 & PC2 fa")plotVar(splsda.fa_t, var.names = TRUE, comp= c(1,2),title = "PlotVar PC1 & PC2 fa")plotVar(splsda.pfa_t, var.names = TRUE, comp= c(1,2),title = "PlotVar PC1 & PC2 pfa")plotLoadings(splsda.fa_t, comp=1, contrib = 'max')plotLoadings(splsda.fa_t, comp=2, contrib = 'max')plotLoadings(splsda.pfa_t, comp=1, contrib = 'max')plotLoadings(splsda.pfa_t, comp=2, contrib = 'max')pls.fa.pfa <- pls(fa_t,pfa_t)
plotIndiv(pls.fa.pfa,rep.space="XY-variate",
title = "plotIndiv PLS",
ind.names=FALSE,
group=metadata$condition,
pch = as.numeric(factor(metadata$condition)),
pch.levels =metadata$condition,
legend = TRUE)plotVar(pls.fa.pfa, var.names = c(FALSE, FALSE))spls.fa.pfa <- spls(fa_t,pfa_t, ncomp=10, keepX = c(10,10,10))
plotIndiv(spls.fa.pfa,rep.space="XY-variate",
title = "plotIndiv Sparse PLS",
ind.names=FALSE,
group=metadata$condition,
pch = as.numeric(factor(metadata$condition)),
pch.levels =metadata$condition,
legend = TRUE)plotVar(spls.fa.pfa, var.names = c(FALSE, FALSE))plotLoadings(spls.fa.pfa, comp=1, size.title = 1,name.var = NULL, max.name.length = 50)plotLoadings(spls.fa.pfa, comp=2, size.title = 1,name.var = NULL, max.name.length = 50)